ASYMPTOTIC REDUNDANCIES FOR UNIVERSAL 
QUANTUM CODING 



CHRISTIAN KRATTENTHALER AND PAUL B. SLATER 

Abstract. Clarke and Barron have recently shown that the Jeffreys' invariant 
prior of Bayesian theory yields the common asymptotic (minimax and maximin) 
redundancy of universal data compression in a parametric setting. We seek a possi- 
ble analogue of this result for the two-level quantum systems. We restrict our con- 
siderations to prior probability distributions belonging to a certain one-parameter 
family, q{u), ~oo < u < 1. Within this setting, we are able to compute exact redun- 
dancy formulas, for which we find the asymptotic limits. We compare our quantum 
asymptotic redundancy formulas to those derived by naively applying the classical 
counterparts of Clarke and Barron, and find certain common features. Our results 
are based on formulas we obtain for the eigenvalues and eigenvectors of 2" x 2" 
(Bayesian density) matrices, Cn(u). These matrices are the weighted averages (with 
respect to q{u)) of all possible tensor products of n identical 2x2 density matrices, 
representing the two- level quantum systems. We propose a form of universal coding 
for the situation in which the density matrix describing an ensemble of quantum 
signal states is unknown. A sequence of n signals would be projected onto the 
dominant eigenspaces of C„ (u) . 



1. Introduction 

A theorem has recently been proven ^ (cf. [0, |19|, in the context of 



quantum information theory 0, ^ , that is analogous to the noiseless coding theorem 
of classical information theory. In the quantum result, the von Neumann entropy 



39, 58 



'S'(p) = -Trplogp (1.1) 

(equalling the Shannon entropy of the probability distribution formed by the eigen- 
values of p) of the density matrix, 



o)ir,„ (1.2) 
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describing an ensemble of pure quantum signal states, is equal to log 2 «i .693147 times 
the number of quantum bits ("qubits") — that is, the number of two-dimensional 
Hilbert spaces — necessary to represent the signal faithfully. (Although the binary 
logarithm is usually used in the quantum coding literature, we employ the natural 
logarithm throughout this paper, chiefly to facilitate comparisons of our results with 
those of Clarke and Barron |T^, |T^. p{a) is the probability of the message a from 
a particular source coded into a "signal state" — having a state vector denoted by 
the ket [om) — of a quantum system M. The density matrices tTq are the projections 
TTa = |aA/)(flA/|, with (oa/ | being a bra in the dual Hilbert space.) 

The proof of the quantum coding theorem is based on the existence of a "typical 
subspace" A of the 2"-dimensional Hilbert space of n qubits, which has the property 
that, with high probability, a sample of n qubits has almost unit projection onto 
A. Since it has been shown that the dimension of A is e'^^^''\ the operation that 
the data compressor (a unitary transformation mapping n-qubit strings to ra-qubit 
strings) should perform involves "transposing" the subspace A into the Hilbert space 



of a smaller block of nS'(p)/. 693147 qubits |jT9[. (Lo |^ has generalized this work 
for an ensemble of mixed quantum signal states.) 

In this study we dispense with the assumption that a priori information (other 
than its dimensionality) is available regarding p. Somewhat similarly motivated, 
Calderbank and Shor [|I^ modified the definition of fidelity — a measure of the success 
of transmission of quantum states — because "previous papers discuss channels that 
transmit some distribution of states given a priori, whereas we want our channel to 
faithfully transmit any pure input state". They took as their measure, the fidelity 
for the pure state transmitted least faithfully. 



Proceeding in a noninformative Bayesian framework H, HUl ISO, BTI], we seek to 



extend to the two-level quantum systems, recent results of Clarke and Barron ||I6|, [l^, 
r3 giving various forms of the asymptotic redundancy of universal data compression 



for parameterized families of probability distributions. "The redundancy is the excess 
of the [coding] cost over the entropy. The goal of data compression is to diminish 



redundancy" (^3[; reviewed in pOf). "The idea of universal coding, suggested by 
Kolmogorov, is to construct a code for data sequences such that asymptotically, as the 
length of the sequence increases, the mean per symbol code length would approach the 
entropy of whatever process in a family has generated the data" |^ . For an extensive 



commentary on the results of Clarke and Barron, see Also see for some 



recent related research, as well as a discussion of various rationales that have been 
employed for using the (classical) Jeffreys' prior — a possible quantum counterpart 
of which will be of interest here — for Bayesian purposes, cf. [^]. Let us also bring 
to the attention of the reader that in a brief review of |T^, the noted statistician, 
I. J. Good, commented that Clarke and Barron " have presumably overlooked the 
reviewer's work" and cited, in this regard (It should be noted that in these 

papers. Good uses a more general objective function — a two-parameter utility — 
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than the relative entropy, chosen by Clarke and Barron over alternative measures |T^, 
p. 454]. Good does conclude that Jeffreys' invariant prior is the minimax, that is, 
the least favorable, prior when the utility is the "weight of the evidence" in the sense 
of C. S. Pierce, that is, the relative entropy.) 

Clarke and Barron |T0, K7\, IT8| found the asymptotic redundancy to be given by 



/7 r) 1 

-log — + -logdet/(^) -logi/;(^^) + 0(1). (1.3) 
/ zvre 2 

Here, ^ is a li-dimensional vector of variables parameterizing a family (manifold) of 
probability distributions. I (6) is the d x d Fisher information matrix — the negative 
of the expected value of the Hessian of the logarithm of the density function — and 
w{6) is the prior density. The asymptotic minimax redundancy was shown to be 

mm 

iog £- + logj^ y^d^tmde + 0(1), (1.4) 

where K is a. compact set in the interior of the domain of the parameters. 

In this investigation, instead of probability densities as in [|16], |1^ |18[, we employ 
density matrices (nonnegative definite Hermitian matrices of unit trace) and instead 
of the classical form of the relative entropy (the Kullback-Leibler information mea- 
sure), its quantum counterpart |^9|, ^ (cf. 



S{pup2) = Trpi(logpi - logp2), (1-5) 

that is, the relative entropy of the density matrix pi with respect to p2- 

The three-dimensional convex set of 2 x 2 density matrices that will be the focus 
of our study has members representable in the form, 

P=-(^l" T'^) ■ (1-6) 

Such matrices correspond, in a one-to-one fashion, to the standard (complex) two- 
level quantum systems — notably, those of spin-1/2 (electrons, protons, ... ) and 
massless spin-1 particles (photons). (If we set x = y = in ( |1.6|) , we recover a 
classical binomial distribution, with the probability of "success", say, being {l + z)/2 
and of "failure", (1 — z)/2. Setting either x or y to zero, puts us in the framework of 
real — as opposed to complex — quantum mechanics.) The points {x, y, z) must he 
within the unit ball ("Bloch sphere" [|ll|])5 x"^ + y"^ + z"^ < 1, due to the requirement 
for p of nonnegative eigenvalues. (The points on the bounding spherical surface, 
x"^ ^-y"^ + z'^ = 1, corresponding to the pure states, will be shown to exhibit nongeneric 
behavior, see ( |2.38| ) and the respective comments in Sec. |^ (cf. |^).) We have, for 
(|1.6| ), using spherical coordinates (r, 'd, 0), so that r = (x^ + + z'^Y^'^, 

(1-r) ^ (1-r) (l + r), (1 + ^) 
S{p) = ^ — log — — log — - — . (1.7) 
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A composite system of n identical independent (unentangled) two-level quantum 

n 

systems is represented by the 2" x 2" density matrix ®p — possessing a von Neumann 
entropy nS{p) (In noncommutative probability theory, independence can be 

based on free products instead of tensor products Along with the real and 



complex forms of quantum mechanics, a quaternionic version exists P2[, for which 
the [presumed] quantum Jeffreys' prior has been found for the two-level systems — 
corresponding to the /ive-dimensional unit ball/"Bloch sphere" However, the 



definition of a tensor product is somewhat problematical in this context . 

In it was argued that the quantum Fisher information matrix (requiring — 
due to noncommutativity — the computation of sjTiimetric logarithmic derivatives 
||42|| ) for the density matrices (|1.6| ) should be taken to be of the form 

^ fl — y"^ — xy xz \ 

^(^) = n 2 2 ^ l-x'-z' yz . (1.8) 

il-x^-y'-z') y l-x'-yy 

The quantum counterpart of the Jeffreys' prior was, then, taken to be the normalized 
form (dividing by vr^) of the square root of the determinant of ( [1.8|) , that is. 

Analogously, the classical Jeffreys' prior is proportional to the square root of the 
determinant of the classical Fisher information matrix . 

On the basis of the result of Clarke and Barron 1^ that the Jeffreys' prior 



yields the asymptotic common (minimax and maximin) redundancy (that is, the 
least favorable and reference priors are the same), it was conjectured that its 
assumed quantum counterpart ( |1.9|) would have similar properties, as well. (The 
Jeffreys' prior has been "shown to be a minimax solution in a — two person — 
zero sum game, where the statistician chooses the 'non- informative' prior and nature 
chooses the 'true' prior" 0, Quantum mechanics itself has been asserted to arise 
from a Fisher-information transfer zero sum game [^.) To examine this possibility, 
( |1.9| ) was embedded as a specific member {u = .5) of a one-parameter family of 
spherically-symmetric/unitarily-invariant probability densities, 

/ N r(5/2-M) 

= 3/2 rn Vn 2 2 2^1' -00 < u < 1. 1.10 

7]-^/^ i (1 — m) (1 — a;^ — — z^)" 

(Under unitary transformations of p, the assigned probability is invariant.) For u = 0, 
we obtain a uniform distribution over the unit ball. (This has been used as a prior over 
the two- level quantum systems, at least, in one study [0.) For u 1, the uniform 
distribution over the spherical boundary (the locus of the pure states) is approached. 
(This is often employed as a prior, for example |2^, 0, 0.) For m — > — 00, a Dirac 
distribution concentrated at the origin (corresponding to the fully mixed state) is 
approached. 
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Embeddings of ( |1.9| ) in other (possibly, multiparameter) families are, of course, 
possible and may be pursued in further research. Ideally, we would aspire to formally 
demonstrate — if it is, in fact, so — that ( p.. 9] ) can be uniquely characterized vis-a-vis 
all other possible probability distributions over the unit ball. Due to the present lack 
of any such fully rigorous treatment, analogous to that of Clarke and Barron, we rely 

n 

upon an exploratory heuristic computational strategy. This involves averaging ®p 
with respect to q{u). Doing so yields a one-parameter family of 2" x 2" Bayesian 
density matrices (Bayes codes or estimators |T^, |16|, 0), Cniu), —oc < u < 1, 
exhibiting highly interesting properties. 

We explicitly find (in Sec. |^) the eigenvalues and eigenvectors of the matrices (n{u) 

n 

and determine the relative entropy (|1.5D of 0p with respect to Cn(w). We do this 



by using identities for hypergeometric series and some combinatorics. (It is also 
possible to obtain some of our results by making use of representation theory of 
SU{2). An even more general result was derived by combining these two approaches. 
We comment on this issue at the end of Sec. |.) 

The matrices Cn(w) should prove useful for the universal version of Schumacher 
data compression 0, |19|, ^ ^ by projecting blocks of n signals (qubits) onto those 



"typical" subspaces of 2"-dimensional Hilbert space corresponding to as many of 
the dominant eigenvalues of Cn{u) as it takes to exceed a sum 1 — e. (This can be 
accomplished by a unitary transformation, the inverse of which would be used in the 
decoding step M. In the corresponding nonuniversal quantum coding context, the 



projection onto the dominant eigenvalues of <S)p yields fidelity greater than 1 — 2e pO 
and distortion less than 2e [^], cf. [§].) For all u, the leading one of the |_|J + 1 
distinct eigenvalues has multiplicity n + 1, and belongs to the [n + l)-dimensional 
(Bose-Einstein) symmetric subspace @]. (Projection onto the symmetric subspace 
has been proposed as a method for stabilizing quantum computations, including 
quantum state storage 0.) For u = 1/2, the leading eigenvalue can be obtained by 
dividing the n + 1-st Catalan number — that is, ;^(^^^+/'') — by 4". (The Catalan 
numbers "are probably the most frequently occurring combinatorial numbers after 
the binomial coefficients" 



Let us (naively) attempt to apply the formulas of Clarke and Barron [|T^, 
( |1.4| ) and ( |1.3| ) above — to the quantum context under investigation here. We do 
this by setting to 3 (the dimensionality of the unit ball — which we take as K), 
det I{9) to (l-x2-y2_^2)-i (^^i (Q), so that ^det I{9) dO is tt^, and w{9) to 
q{u). Then, we obtain from the expression for the asymptotic minimax redundancy 



^(logn - log2 - 1) + i logvr + o(l), (1.11) 
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and from the expression for the asymptotic redundancy itself (|1.3|) 



-(logn - log2 - 1) - (1 - u) log(l - r^) + logr(l - u) - logT ^ "J + ^"^^^ 

(1.12) 

We shall (in Sec. |^) compare these two formulas, ( |1.11D and ( |1.12| ), with the results 
of Sec. ^ and find some striking similarities and coincidences, particularly associated 
with the fully mixed state (r = 0). These findings will help to support the working 
hypothesis of this study — that there are meaningful extensions to the quantum do- 
main of the (commutative probabilistic) theorems of Clarke and Barron. However, 
we find that although the minimax property of the Jeffreys' prior appears to carry 
over, the maximin property does not strictly, but only in an approximate sense. In 
any case, we can not formally rule out the possibility that the actual global (per- 
haps common) minimax and maximin are achieved for probability distributions not 
belonging to the one-parameter family q{u). 

Let us point out to the reader the quite recent important work of Petz and Sudar 



42 1 . They demonstrated that in the quantum case — in contrast to the classical 



situation in which there is, as originally shown by Chentsov [|T4| , essentially only one 
monotone metric and, therefore, essentially only one form of the Fisher information 
— there exists an infinitude of such metrics. "The monotonicity of the Riemannian 
metric g is crucial when one likes to imitate the geometrical approach of [Chentsov]. 
An infinitesimal statistical distance has to be monotone under stochastic mappings. 
We note that the monotonicity of (7 is a strengthening of the concavity of the von 
Neumann entropy. Indeed, positive definiteness of g is equivalent to the strict concav- 
ity of the von Neumann entropy . . . and monotonicity is much more than positivity" 



The monotone metrics on the space of density matrices are given by the op- 
erator monotone functions f{t) : —>■ M"*", such that /(I) = 1 and f{t) = tf{l/t). 
For the choice / = (1 + 1)/2, one obtains the minimal metric (of the symmetric log- 
arithmic derivative), which serves as the basis of our analysis here. "In accordance 
with the work of Braunstein and Caves, this seems to be the canonical metric of 
parameter estimation theory. However, expectation values of certain relevant ob- 
servables are known to lead to statistical inference theory provided by the maximum 
entropy principle or the minimum relative entropy principle when a priori informa- 
tion on the state is available. The best prediction is a kind of generalized Gibbs 
state. On the manifold of those states, the differentiation of the entropy functional 
yields the Kubo-Mori/Bogoliubov metric, which is different from the metric of the 
symmetric logarithmic derivative. Therefore, more than one privileged metric shows 
up in quantum mechanics. The exact clarification of this point requires and is worth 
further studies" It remains a possibility, then, that a monotone metric other 

than the minimal one (which corresponds to g(.5), that is ( |1.9| )) may yield a common 
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global asymptotic minimax and maximin redundancy, thus, fully paralleling the clas- 
sical/nonquantum results of Clarke and Barron [0, |T^, [1^. We intend to investigate 
such a possibility, in particular, for the Kubo-Mori/Bogoliubov metric HTI, H^, H3|. 



2. Analysis of a One-Parameter Family of Bayesian Density Matrices 

In this section, we implement the analytical approach described in the Introduction 
to extending the work of Clarke and Barron [|1^, [18| to the realm of quantum me- 
chanics, specifically, the two-level systems. Such systems are representable by density 



matrices p of the form ( |1.6|) . A composite system of n independent (unentangled) and 
identical two-level quantum systems is, then, represented by the n-fold tensor prod- 

n n 

uct ®p. In Theorem |I| of Sec. |2.1| , we average ®p with respect to the one-parameter 
family of probability densities q{u) defined in (|1.1CI|) , obtaining the Bayesian density 
matrices Cn(w) and formulas for their 2^" entries. Then, in Theorem ^ of Sec. |2.2| , 
we are able to explicitly determine the 2" eigenvalues and eigenvectors of Cniu). Us- 

n 

ing these results, in Sec. |2.3| , we compute the relative entropy of ®p with respect 



to Cn{u)- Then, in Sec. |2.4| , we obtain the asymptotics of this relative entropy for 



n CO. In Sec. 2.5, we compute the asymptotics of the von Neumann entropy (see 



(|1.1|)) of Cn(w). AH these results will enable us, in Sec. ^, to ascertain to what extent 



the results of Clarke and Barron could be said to carry over to the quantum domain. 



2.1. Entries of the Bayesian density matrices Cn{u). The n-fold tensor product 

n n 

(S)p is a 2"' X 2" matrix. To refer to specific rows and columns of ®p, we index them by 
subsets of the n-element set {1,2, ... ,n}. We choose to employ this notation instead 
of the more familiar use of binary strings, in order to have a more succinct way of 
writing our formulas. For convenience, we will subsequently write [n] for {1,2,..., n}. 

n 

Thus, ®p can be written in the form 

n 

where 

Ru = 4(1 + zT^'il - ^r^^ix + lyT^^ix - lyY^t, (2.1) 

with rigg denoting the number of elements of [n] contained in both / and J , 
denoting the number of elements not in both I and J, n^g denoting the number of 
elements not in / but in J, and n^^ denoting the number of elements in I but not in 
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J. In symbols, 



ngg = |/ n J|, 
n^^ = |N\(/UJ)| 

net = \I\JV 



n 



We consider the average Cn{u) of ®p with respect to the probabihty density q{u) 
defined in (|1.1CI|) taken over the unit sphere {{x,y,z) : + + < 1}. This 



average can be described exphcitly as follows. 
Theorem 1. The average C,n{u), 

f / " \ 

/ \® p) li'^) dx dy dz, 

equals the matrix {Zij)jj^[n], where 

l V{l-u)V{2 + l + --^---f-u)V{2 + l + --^---^-u) 

2n r(| + t-«) r(2 + f-«) r(2 + t-i^-^-«) 

Here, 6ij denotes the Kronecker delta, 6ij = 1 if i = j and 6ij = otherwise. 



(2.2) 



Remark. It is important for later considerations to observe that because of the term 
"^n^g.ng^ in (|2.2|) the entry Zjj is nonzero if and only if the sets / and J have the same 
cardinality. If / and J have the same cardinality, c say, then Zjj only depends on 
rigg, the number of common elements of / and J, since in this case n^^ is expressible 
as — 2c + rigg. 



Proof of Theorem |1|. To compute Zu, we have to compute the integral 

/ Rij q{u) dxdydz. (2.3) 

For convenience, we treat the case that n^g > and n^g > ng^. The other four 
cases are treated similarly. 
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First, we rewrite the matrix entries Rij, 



2" 



j,fc,«>0 

Of course, in order to compute the integral (|2.3|) , we transform the Cartesian coor- 
dinates into polar coordinates, 

X = r sin -(9 cos (p 
y = r sin sin ip 
z = r cos '(9, 
< < 27r, < ^9 < vr. 

Thus, using (|2.4|), the integral (|2.3|) is transformed into 



on. ^ 



On Z / III \ -? 

^ i,M>o-^o Jo \ J 

. ^2i+fc+n^g+ng^+2 (^poS^''^^ t?) (sin"*e+"e^+l 

• (cos"^^-"^-' ^) (sin' ^) ^s/^r^ll^a-r^r ^"'^^ 
To evaluate this triple integral we use the following standard formulas: 

=in-' « CO.- « M = ,(2M -l)n(2iV -l)!! 

(2M + 2iV)!! ' ^ ' 

(2M + 2A^+1)!! ^ ^ 

and / sin2^^+^ ^9 cos^^ ^9 = 0, (2.6c) 

sin^A^ 7? cos2^+^ ^d^ = 0, (2.6d) 

sin2*^+i 7? cos2^+^ i^d^ = 0, (2.6e) 
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for any nonnegative integers M and A^. Furthermore, we need the beta integral 

r(i-n) 



1 



(2.7) 



,0 {l-r^y^' 2V{^-u) 
Now we consider the integral over Lp in ( p.5|) . Using (|2.6c|) and ( p.6d| ), we see that 



each summand in ( |2.5| ) vanishes if n^g has a parity different from n^^. On the other 
hand, if n^g has the same parity as n^^, then we can evaluate the integrals over Lp 
using (|2.6a|) and ( p.6e|) . Discarding for a moment the terms independent of if and /, 
we have 



5^ /'<-'>'( 

i>o V 

= D-i)'( 



^ ""^^j (cos"^^-"^^-V) (sinV) 



rir 



21 



27T 



{21 - {n^f - 



21 -m 



(ng^ - n^g)!! 



2n- 



(n 



l>0 



n.g)/2 



-1)' 



27r5. 



the last line being due to the binomial theorem. These considerations reduce (|2.5|) to 



)n / ^ 



1 /.vr 



~'0 



-1 



A; 



•rW2n,,+2 (cos^j+fc^) (sin2"^^+^i9) ^^ly^ 



2r(5/2-M) 



XI -m) (1 -r2)« 
Using (ISD , (HD and this can be further simplified to 



dd dr. 



ee\ (n^ - ^ee\ 2 (2j + 2A; - 1)!! (2n^g)!! 



2A; ) (2j + 2fc + 2n^g + 1)!! 

r(j + A; + n^g + 3/2) r(l - u) 2 1(5/2 - m) 
2r(j + A; + n^g + 5/2-u) 7ri/2r(l-M) 



• (2.8) 



Next we interchange sums over j and k and write the sum over k in terms of the 
standard hypergeometric notation 



s 



ai, . . . , 

b ' 



E 

k=0 



[aijk ■ ■ ■ [ttrjk 

k\ {hi)k ■ ■ ■ {hs)k 
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where the shifted factorial {a)k is given by {a)k '■= ci{a + 1) ■ ■ ■ (a + — 1), k > 1, 
(a)o := 1. Thus we can write (|2.8|) in the form 



k>0 ^ 



ee 



{2k-l)\\n^^\T{l-u) 
2'=+ir(| + /c + n^e-n) 



■ 2F1 



I + A; + 



u 



■ (2.9) 



The 2-^1 series can be summed by means of Gaufi' 2-^1 summation (see e.g. HS 
(1.7.6); Appendix (III.3)]) 

T{c)T{c-a-b) 



2F1 



a, b 



1 



(2.10) 



r(c-a)r(c-6)' 

provided the series terminates or Re(c ~ a — b) > 0. Applying ( [^.lOD to the in 
(|2.9| ) (observe that it is terminating) and writing the sum over as a hypergeometric 
series, the expression (^.91) becomes 



1 r(2 + nge + n^g - u) T (| - m) n^g! 
2^r (f + ngg + n^g - m) r(2 + n^g - m) 



X 2F1 



nee 
2 



_ "i^g 1 _|_ rtgg _ "i^g ■ 
2 ' 2 ~'~ 2 2 . ]^ 

I + rigg + n^g - u ' 



Another application of ( |2.1CI| ) gives 



(^ng^.n^g on 



X 



r(2 + rigg + n^g - u) r(2 + + n^g - n) T (| - m) n^g! 
r (I + T + ^ + - r (2 + ^ + ^ + n^g - m) r(2 + n^g - n) ' 



(2.11) 

Trivially, we have n = rigg + + n^g + ng^. Since ( |2.11| ) vanishes unless ri^g = rig^, 
we can substitute (n — ngg — n^^)/2 for n^g in the arguments of the gamma functions. 
Thus, we see that (|2.11|) equals (|2.2| ). This completes the proof of the Theorem. □ 

2.2. Eigenvalues and eigenvectors of the Bayesian density matrices Cn{u). 

n 

With the explicit description of the result Cniu) of averaging ®p with respect to 
q{u) at our disposal, we now proceed to describe the eigenvalues and eigenspaces of 
Cn{u). The eigenvalues are given in Theorem ^. Lemma |^ gives a complete set of 
eigenvectors of Cn{u). The reader should note that, though complete, this is simply 
a set of linearly independent eigenvectors and not a fully orthogonal set. 
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Theorem 2. The eigenvalues of the 2" x 2" matrix (n{u), the entries of which are 
given by (|2.2|), are 



1 T - u) T{2 + n - d - u)T{l + d - u) 

Ari = —T^ V , ^ , (i = 0, 1 



2" r(| + f -m) r(2 + f -u)r(i-nj 

with respective multiplicities 

{n-2d + If fn + 1 



n 

.2J 



(n + l) \ d 



(2.12) 



(2.13) 



The Theorem wiU foUow from a sequence of Lemmas. We state the Lemmas first, 
then prove Theorem ^ assuming the truth of the Lemmas, and after that provide 
proofs of the Lemmas. 

In the first Lemma some eigenvectors of the matrix Cn(w) are described. Clearly, 
since Cn('w) is a 2" x 2" matrix, the eigenvectors are in 2"-dimensional space. As 
we did previously, we index coordinates by subsets of [n], so that a generic vector is 
{xs)se[n]- In particular, given a subset T of [n], the symbol ct denotes the standard 
unit vector with a 1 in the T-th coordinate and elsewhere, i.e., ct = (5s,T)se[n]- 

Now let d, s be integers with 0<d<s<n — d and let A and B be two disjoint 
(i-element subsets A and B of [n]. Then we define the vector Vd,s{A, B) by 

Vd,s{AB):= (-l)""exux'uy, (2.14) 

XCA 

YC[n]\{AUB), \Y\=s-d 

where X' is the complement of X in B" by which we mean that if X consists of the 
ii-, 22", • • • -largest elements of A, ii < i2 < ■ ■ ■ , then X' consists of all elements of 
B except for the ii-, i2-, . . . -largest elements of B. For example, let n = 7. Then 
the vector t>2,3({l, 3}, {2, 5}) is given by 

e{2,4,5} + e{2,5,6} + e{2,5,7} - e{l,4,5} - e{l,5,6} - e{l,5,7} 

- e{2,3,4} - e{2,3,6} - e{2,3,7} + e{i,3,4} + e{i_3,6} + e{i,3,7}. (2.15) 

(In this special case, the possible subsets X of A = {1,3} in the sum in ( p^.l4D are 
0, {1}, {3}, {1,3}, with corresponding complements in B = {2,5} being {2,5}, {5}, 
{2}, 0, respectively, and the possible sets Y are {4}, {6}, {7}.) Observe that all sets 
X U X' UY which occur as indices in ( |2.14|) have the same cardinality s. 



Lemma 3. Let d, s be integers with 0<d<s<n — d and let A and B be disjoint 
d-element subsets of [n]. Then Vd^s{A, B) as defined in ( ^.14| ) is an eigenvector of the 
matrix Cn{u), the entries of which are given by (P^), for the eigenvalue Xa, where Xd 
is given by ( ^.121 ). 



We want to show that the multiplicity of equals the expression in ( |2.13| ). Of 



course. Lemma gives many more eigenvectors for A^. Therefore, in order to describe 



ASYMPTOTIC REDUNDANCIES 



13 



a basis for the corresponding eigenspace, we have to restrict the collection of vectors 
in Lemma |^. 

We do this in the following way. Fix rf, < d < [n/2j. Let P be a lattice path 
in the plane integer lattice 1? , starting in (0,0), consisting oi n — d up-steps (1, 1) 
and d down-steps (1,-1), which never goes below the x-axis. Figure 1 displays an 
example with n = 7 and d = 2. Clearly, the end point of P is (n, n — 2d). We call 
a lattice path which starts in (0, 0) and never goes below the ballot path. 

(This terminology is motivated by its relation to the (two-candidate) ballot problem, 
see e.g. Ch. 1, Sec. 1]. An alternative term for ballot path which is often used is 
"Dyck path", see e.g. [^, p. 1-12].) We will use the abbreviation "b.p." for "ballot 
path" in displayed formulas. 



Given such a lattice path P, label the steps from 1 to n, as is indicated in Figure 1. 
Then define to be set of all labels corresponding to the first d up-steps of P and 
Bp to be set of all labels corresponding to the d down-steps of P. In the example 
of Figure 1 we have for the choice d = 2 that Ap = {1, 3} and Bp = {2, 5}. Thus, 
to each d and s,0<d<s<n — d, and P as above we can associate the vector 
Vd,s{A.p, Bp). In our running example of Figure 1 the vector f2,3(P) would hence be 
t'2,3({l, 3}, {2, 5}), the vector in ( p.l5| ). To have a more concise form of notation, we 
will write fd,s(P) for Vd,s{Ap, Bp) from now on. 

Lemma 4. The set of vectors 

{vd,s{P) : < d < s < n - d, Pa ballot path from (0, 0) to {n, n - 2d)} (2.16) 

is linearly independent. 

The final Lemma tells us how many such vectors Vd.s{P) there are. 

Lemma 5. The number of ballot paths from (0, 0) to (n, n — 2d) is ""^j^'*' {"'^^) ■ The 
total number of all vectors in the set ( p.l6|) is 2^. 

Now, let us for a moment assume that Lemmas ^|-^ are already proved. Then, 
Theorem |^ follows immediately, as it turns out. 




Ballot paths 
Figure 1 
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Proof of Theorem |2|. Consider the set of vectors in ( |2.16| ). By Lemma |] we 



know that it consists of eigenvectors for the matrix Cn(w). In addition, Lemma ^ tells 
us that this set of vectors is linearly independent. Furthermore, by Lemma |^ the 
number of vectors in this set is exactly 2", which is the dimension of the space where 
all these vectors are contained. Therefore, they must form a basis of the space. 

Lemma ^ says more precisely that Vd,s{P) is an eigenvector for the eigenvalue A^. 
From what we already know, this implies that for fixed d the set 

{vd,s{P) '■ d < s < n — d, Pa ballot path from (0, 0) to {n, n — 2d)} 

forms a basis for the eigenspace corresponding to A^. Therefore, the dimension of the 
eigenspace corresponding to A^ equals the number of possible numbers s times the 
number of possible lattice paths P. This is exactly 

[n — M + 1)- 



[n + l) \ d 

the number of possible lattice paths P being given by the first statement of Lemma |^. 
This expression equals exactly the expression ( |2.13|) . Thus, Theorem ^ is proved. 

□ 

Now we turn to the proofs of the Lemmas. 

Proof of Lemma |3[ Let rf, s and A, B be fixed, satisfying the restrictions in the 
statement of the Lemma. We have to show that 

Restricting our attention to the J-th component, we see from the definition ( p.l4|) of 
Vd,s{A., B) that we need to establish 

'Ad(-l)l^l if / is of the form f/ U [/' U 1/ 
for some U and V, U ^ A, 
V C [n]\{AUB), \V\ = s-d 
otherwise. 

(2.17) 

We prove ( p. 17] ) by a case by case analysis. The first two cases cover the case "oth- 
erwise" in (|2.17|) , the third case treats the first alternative in ( 2.17|) . 



yc[n]\(AuB), \Y[- 



Case 1. The cardinality of I is different from s. As we observed earlier, the 
cardinality of any set XUX'UY which occurs as index at the left-hand side of (p. 17 ) 



equals s. The cardinality of / however is different from s. As we observed in the 
Remark after Theorem ^ this implies that any coefficient ^/,xux'uy on the left-hand 
side vanishes. Thus, (|2.17| ) is proved in this case. 

Case 2. The cardinality of I equals s, hut I does not have the form U VMJ' UV for 
any U and V, U C A, V C [n]\{AL}B), \V\ = s - d. Now the sum on the left-hand 
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side of ( |2.17| ) contains nonzero contributions. We have to show that they cancel each 
other. We do this by grouping summands in pairs, the sum of each pair being 0. 

Consider a set X U X' U Y which occurs as index at the left-hand side of ( [2.171 ). 
Let e be minimal such that 

either: the e-th largest element of A and the e-th largest element of B are both 
in /, 

or: the e-th largest element of A and the e-th largest element of B are both not 
in /. 

That such an e must exist is guaranteed by our assumptions about /. Now consider 
X and X'. If the e-th largest element of A is contained in X then the e-th largest 
element of B is not contained in X', and vice versa. Define a new set X by adding 
to X the e-th largest element of A if it is not already contained in X, respectively by 
removing it from X if it is contained in X. Then, it is easily checked that 

On the other hand, we have (—1)'"^' = —(—1)'"^' since the cardinalities of X and X 
differ by ±1. Both facts combined give 

Hence, we have found two summands on the left-hand side of ( |2.17| ) which cancel 
each other. 

Summarizing, this construction finds for any X, Y sets X, Y such that the corre- 
sponding summands on the left-hand side of ( p.l7| ) cancel each other. Moreover, this 
construction applied to X, Y gives back X, Y. Hence, what the construction does is 
exactly what we claimed, namely it groups the summands into pairs which contribute 
to the whole sum. Therefore the sum is 0, which establishes ( |2.17| ) in this case also. 

Case 3. I has the form UUU'UV for some UandV,UCA,VC [n]\{A U B), 
\V\ = s — d. This assumption implies in particular that the cardinality of / is s. 
From the Remark after the statement of Theorem we know that in our situation 
Zi^xvjX'vjY depends only on the number of common elements in / and X U X' U F. 
Thus, the left-hand side in (|2.171 ) reduces to 

where N{j, k) is the number of sets X U X' U Y , for some X and Y , X C A, Y ^ 
[n]\{AU B), \Y\ = s — d, which have s — k elements in common with J, and which 
have d—j elements in common with lr\{AUB) = UUU'. Clearly, we used expression 
(2.2) with rigg = s — k and n^^ = n — s — k. 

To determine N{j,k), note first that there are (^) possible sets X U X' which 
intersect U U U' in exactly d — j elements. Next, let us assume that we already 
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made a choice for X U X'. In order to determine the number of possible sets Y 
such that X U X' U Y has s — k elements in common with J, we have to choose 
{s — k) — {d — j) =s — d + j — k elements from V, for which we have 
possibilities, and we have to choose s — d — {s — d + j — k) = k — j elements from 
U A U B) to obtain a total number of s elements, for which we have ("^^^'^) 
possibilities. Hence, 



N{j,k) 



n — s — d 
k-j 



(2.19) 



So it remains to evaluate the double sum (|2.18|) , using the expression (|2.19|) for 
N{j,k). 

We start by writing the sum over j in ( |2.18| ) in hypergeometric notation, 
^ r(| - m) r(2 + n - s - u) r(2 + s - u) 



r(2-u)r(2 + f -M)r(| + f -n) 

{d - s)k{d - n + s)k r 



fc=0 



il)k{2-u)k 



-3-1^2 



1 — d — k + s,l — d — k + n — 



To the 3F2 series we apply a transformation formula of Thomae (see e.g. (3.1.1)]), 



3-^2 



a, 6, 



-m 



d, e 



-b + e) 



"3-^2 



— m, b, —a + d 
d,l + b — e — 



;1 



(2.20) 



where m is a nonnegative integer. We write the resulting 3F2 again as a sum over 
j, then interchange sums over k and j, and write the (now) inner sum over k in 
hypergeometric notation. Thus we obtain 



'_l)\u\ ^ ^(2 



u)T{2 + n-s-u)T{2 + s 



u 



2" r(| + f -M)r(2 + f -u)r(2 



u 



00 

j=0 



o? + s)j 



j — n + s, (i 



2+J 



;1 



The 2-Fi series in this expression is terminating because d — s is a nonpositive integer. 
Hence, it can be summed by means of GauB' sum ( |2.10| ). Writing the remaining sum 
over j in hypergeometric notation, the above expression becomes 

u)T{2 + n - d - u)T{2 + s 



-1)1^1- 



•5 
^2 



u 



2" r(| + f - u) r(2 + f - m) r(2 + s - c/ 



u 



—d, 1 — d + s 
2-d + s 



Again, the 2F1 series is terminating and so is summable by means of (|2.10|) . Thus, 
we get 

1 r {I - u) r{2 + n - d - u)r{l + d - u) 



2" r(| + f -ti) r(2 + f -M)r(i 



u 
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which is exactly the expression (gJ^) for Ad times (-l)'^'. This proves (|2.17|) in this 
case. 

The proof of Lemma ^ is now complete. □ 

Proof of Lemma |4[ We know from Lemma ^ that Vd,s{P) lies in the eigenspace 
for the eigenvalue A^, with A^ being given in ( |2.12| ). The XdS, d = 0,1, . . . , ['n/2\. 



are all distinct, so the corresponding eigenspaces are linearly independent. Therefore 
it suffices to show that for any fixed d the set of vectors 

{vd.s{P) '■ d < s < n — d. Pa. ballot path from (0, 0) to {n, n — 2d)} 

is linearly independent. 

On the other hand, a vector Vd,s{A,B) lies in the space spanned by the standard 
unit vectors with |T| = s. Clearly, as s varies, these spaces are linearly indepen- 
dent. Therefore, it suffices to show that for any fixed d and s the set of vectors 

{vd,siP) ■■ P a ballot path from (0, 0) to {n, n ~ 2d)} 

is linearly independent. 

So, let us fix integers d and s with 0<d<s<n — d, and let us suppose that 
there is some vanishing linear combination 



Cp »<!,.(/•) =0. (2.21) 

P b.p. from (0,0) to {n,n-2d) 

We have to establish that cp = for all ballot paths P from (0, 0) to (n, n — 2d). 

We prove this fact by induction on the set of ballot paths from (0, 0) to [n, n — 2d) . 
In order to make this more precise, we need to impose a certain order on the ballot 
paths. Given a ballot path P from (0,0) to {n,n — 2d), we define its front portion 
Fp to be the portion of P from the beginning up to and including P's ci-th up-step. 
For example, choosing d = 2, the front portion of the ballot path in Figure 1 is 
the subpath from (0,0) to (3,1). Note that Fp can be any ballot path starting in 
(0, 0) with d up-steps and less than d down-steps. We order such front portions 
lexicographically, in the sense that Fi is before F2 if and only if Fi and F2 agree 
up to some point and then Fi continues with an up-step while F2 continues with a 
down-step. 

Now, here is what we are going to prove: Fix any possible front portion F . We 
shall show that Cp = for all P with front portion Fp equal to F, given that it 
is already known that cp> = for all P' with a front portion Fpi that is before F . 
Clearly, by induction, this would prove cp = for all ballot paths P from (0, 0) to 
(n, n — 2d). 

Let F be a possible front portion, i.e., a ballot path starting in (0, 0) with exactly 
d up-steps and less than d down-steps. As we did earlier, label the steps of F by 
1,2,..., and denote the set of labels corresponding to the down-steps of F by Bp. 
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We write b for \Bp\, the number of all down-steps of F. Observe that then the total 
number of steps oi F is d + b. 

Now, let T be a fixed [d — 6)-element subset of {d + b -\- l,d + b + 2, . . . ,n}. 
Furthermore, let 5* be a set of the form 5* = Bp U Si U S2, where 5*1 C T and 
S2 ^ {d + b + l,d + b + 2, . . . , n}\T, and such that \S\ = s. 

We consider the coefficient of es in the left-hand side of ( p^.21[ ). To determine this 
coefficient, we have to determine the coefficient of es in Vd,s{P), for all P. We may 
concentrate on those P whose front portion Fp is equal to or later than F, since our 
induction hypothesis says that cp = for all P with Fp before F. So, let P be a 
ballot path from (0, 0) to (n, n — 2d) with front portion equal to or later than F. We 
claim that the coefficient of 65 in Vd^s{P) is zero unless the set Bp of down-steps of 
P is contained in S. 

Let the coefficient of es in Vd^s{P) be nonzero. To establish the claim, we first 
prove that the front portion Fp of P has to equal F. Suppose that this is not the 
case. Then the front portion of P runs in parallel with F for some time, say for the 
first (m — 1) steps, with some m < d + b, and then F continues with an up-step and 
Fp continues with a down-step (recall that Fp is equal to or later than F). By (p.l4|) 
we have 

VdAP)-= Yl (-l)""exux'uy. (2.22) 

XCAp 

YC[n]\{ApUBp), \Y\=s-d 

We are assuming that the coefficient of 65 in Vd^s{P) is nonzero, therefore 5* must be 
of the form S = XUX'UY, with X, Y as described in ( |2.22| ) . We are considering the 



case that the m-th step of Fp is a down-step, whence m G Bp, while the m-th step of 
F is an up-step, whence m ^ Bp. By definition of S, we have Sfljl, 2 . . . , d+b} = Bp, 
whence m ^ S. 

Summarizing so far, we have m G -Bp, m ^ S, for some m < d + b, and 5* = 
X U X' U Y , for some X, Y as described in (|2.22| ). In particular we have m ^ X'. 



Now recall that X' is the "complement of X in Bp" . This says in particular that, if 
m is the i-th largest element in Bp, then the i-th largest element of Ap, a say, is an 
element of X, and so of S. By construction of Ap and Bp, a is smaller than m, so in 
particular a < d + b. As we already observed, there holds S (1 {1,2, . . . ,d + b} = Bp, 
so we have a G Bp, i.e., the a-th step of F is a down-step. On the other hand, we 
assumed that P and F run in parallel for the first (m — 1) steps. Since a G Ap, the 
set of up-steps of P, the a-th step of P is an up-step. We have a < m — 1, therefore 
the a-th step of F must be an up-step also. This is absurd. Therefore, given that 
the coefficient of in Vd,s{P) is nonzero, the front portion Fp of P has to equal F. 

Now, let P be a ballot path from (0, 0) to (n, n — 2d) with front portion equal to 
F, and suppose that S has the form S = X U X' U Y , for some X, Y as described 
in ( |2.22| ). By definition of the front portion, the set of up-steps of P has the 
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property Ap n {1, 2, . . . , d + 6} = {1,2, . . . ,d + b}\Bp. Since \Bp\ = b, these are the 
labels of exactly d up-steps. Since the cardinality of is exactly d by definition, we 
must have Ap = {1,2, . . . ,d + b}\Bp. Because of S O {1,2, . . . ,d + b} = Bp, which 
we already used a number of times, Ap and S are disjoint, which in particular implies 
that Ap and X are disjoint. However, X is a subset of Ap by definition, so X must 
be empty. This in turn implies that X' = Bp. This says nothing else but that the 
set Bp of down-steps of P equals X' and so is contained in S. This establishes our 
claim. 

In fact, we proved more. We saw that S has the form S = XUX'UY, with X = 0. 
This implies that the coefficient of in Vd^s{P), as given by ( p.22|) , is actually +1. 
Comparison of coefficients of es in ( p.21| ) then gives 



Cp = 0, (2.23) 



P b.p. from (0,0) to (n,n-2d) 
Fp=F, BpCS 



for any S = Bp U 5*1 U 5*2, where Si O T and 5*2 C {d + b + l,d + b + 2, . . . , n}\T, 
and such that \S\ = s. 

Now, we sum both sides of (|2.23|) over all such sets S, keeping the cardinality of 5*1 



and 5*2 fixed, say 15*11 = d — b—j, enforcing 15*21 = s — d+j, for a fixed j , < j < d — b. 
For a fixed ballot path P from (0, 0) to {n,n — 2d), with front portion F, with d — b — k 
down-steps in T, and hence with k down-steps in {d + b + l,d + b + 2, . . . ,n}\T, there 
are (jj^j) such sets 5i C T containing all the d — b — k down-steps of P in T, and 

there are (""^t^''];]!"^''^"'') such sets S2 ^ {d + b + 1, d + b + 2, . . . , n}\T containing 
all the k down-steps of P in {(i -|- 6 + 1, d -|- 6 -|- 2, . . . , n}\T. Therefore, summing up 



(ig) gives 



7 / \ Ti — d — S — 7 
A:>0 ^-^^ ^ ^ P b.p. from (0,0) to (n,n-2d) 

Fp=F, \BpnT\=d-b-k 

\Bpr\({d+b+l,d+b+2,...,n}\T)\=k 



(2.24) 



Denoting the inner sum in ( p.24|) by C{k), we see that (|2.24| ) represents a non- 
degenerate triangular system of linear equations for C(0), C(l), . . . ,C{d — b). There- 
fore, all the quantities C(0), C(l), . . . , C{d — b) have to equal 0. In particular, we have 
C(0) = 0. Now, C(0) consists of just a single term cp, with P being the ballot path 
from (0, 0) to {n, n — 2d) , with front portion F, and the labels of the d — b down-steps 
besides those of F being exactly the elements of T. Therefore, we have cp = for this 
ballot path. The set T was an arbitrary {d — 6)-subset of {d + b+l,d + b + 2, . . . ,n}. 
Thus, we have proved cp = for any ballot path P from (0,0) to {n,n — 2d) with 
front portion F. This completes our induction proof. □ 
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Proof of Lemma |5[ That the number of ballot paths from (0, 0) to {n, n — 2d) 
equals ""^^-"^ ("d ^) ^ classical combinatorial result (see e.g. Theorem 1 with 
t = 1]). From this it follows that the total number of vectors in the set ( p.l6| ) is 

(n + l) \ d J 

To evaluate this sum, note that the summand is invariant under the substitution 
d n — 2d + 1. Therefore, extending the range of summation in (|2.25|) to c? = 
0, 1, . . . , n + 1 and dividing the result by 2 gives the same value. So, the cardinality 
of the set (|2.16|) is also given by 

(n-2d+ 1)2 fn + V 



2^ (n + 1) \ d 



Using the simple identity 



n-2d+iy fn + l\ \f'^+^ 



n + l)\ , - 4n I ] +An 



n \ in 



(n + 1) \ d J \ d J \d-lj \d-2 

the last sum can be decomposed into 

Ti:("r)--i:G!>-EC;:0- 

d=0 ^ ^ d=l ^ ^ d=2 ^ ^ 

Each of these sums can be evaluated by the binomial theorem, and thus the expression 
reduces to 2". This completes the proof of the Lemma. □ 

In fact. Theorem |^ can be generalized to a wider class of matrices. 
Theorem 6. Let Cn{u) = (Z/j)/^jg[„] be the 2" x 2" matrix defined by 

Zij := ^ngg.ngg.^^^ ^ ^ ■ /(?^ee - n^t), 

where ngg, etc., have the same meaning as earlier, and where f{x) is a function of 
X which is symmetric, i.e., f{x) = /(— x). Then, the eigenvalues of (n{u) a^^e 

T(2 + n-d-u)r(l + d-u) 

Xd,s = fin-2s)——^ -—^ — -, 0<d<s<n-d, 

r(2 + n — s — u) r(2 + s — u) r(l — uj 

(2.26) 

with respective multiplicities 

n-2d+lfn + l\ , , 

' ^ (2.27) 



n + 1 V d 

independent of s. 
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Proof. The above proof of Theorem ^ has to be adjusted only insignificantly to 
yield a proof of Theorem |^. In particular, the vector Vd^s{^,B) as defined in ( p.l4| ) 
is an eigenvector for A^^s? for any two disjoint (i-element subsets A and B of [n], and 
the set (|2.16|) is a basis of eigenvectors for Cni^). □ 

n 

2.3. The relative entropies of ® p with respect to the Bayesian density 
matrices Cn{u)- We now apply the preceding results to compute the relative entropy 

n 

of (g>p with respect to Cn{u). Utilizing the definition ( |1.5D of relative entropy and 
employing the property PPI, BS[ that S{iS)p) = nS{p), it is given by 



-nS(p)-Tr(®p-logCn(M)). (2.28) 



The term S{p) has been given in ( |1.7| ). Concerning the second term in ( |2.28| ), we 
have the following theorem. 

Theorem 7. Let Cn{u) = (Z/j)/,je[n] be the matrix with entries Zjj given in (|2 
Then, we have 



(2.29) 

with Xd as given in ( 2.12 ). 

Before we move on to the proof, we note that Theorem ^ gives us the following 

n 

expression for the relative entropy of ®p with respect to Cn(w) 

n 

Corollary 8. The relative entropy of ®p with respect to Cniu) equals 
-(1 - r) log((l - r)/2) + -(1 + r) log((l + r)/2) 
_ {n-2d + l) fn + 1 

V ((1 + rr-'^^l - rf - (1 + r)\l - r)"''^^^) log A^, (2.30) 



2"+ir 

with \d as given in ( p.l2| ). 

Proof of Theorem |7[ One way of determining the trace of a linear operator L 
is to choose a basis of the vector space, {vi : / G [n]} say, write the action of L on 
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the basis elements in the form 

Lvi = cjVi + hnear combination of wj's, J ^ I, 

and then form the sum ci of the "diagonal" coefficients, which gives exactly the 
trace of L. 

Clearly, we choose as a basis our set ( |2.16| ) of eigenvectors for Cniu). To determine 

n n 

the action of (8>p ■ logCn(^) we need only to find the action of ®p on the vectors in 
the set ( |2.16| ). We claim that this action can be described as 



■ Vd^s{P) + linear combination of eigenvectors 

Vd',s'{P') with s' ^ s, (2.31) 

for any basis vector Vd,s{P) in ( |2.16 ). 

To see this, consider the J-th component oi y® p) ■ Vd,s{P), i-e., the coefficient of 
6/ in ( pj ■ Vd,siP), I & By the definition ( |2.14| ) of Vd,s{P) it equals 



i?/,xujf'uy(-l)"", (2.32) 

yc[n]\{ApUBp), \Y\=s-d 
n 

where Ru denotes the (/, J)-entry of ®p. (Recall that Ru is given explicitly in 



2T1|).) Now, it should be observed that we did a similar calculation already, namely 
in the proof of Lemma ^. In fact, the expression ( |2.32| ) is almost identical with the 
left-hand side of ( |2.17| ). The essential difference is that Zu is replaced by Ru for 
all J (the nonessential difference is that A,B are replaced by Ap,Bp, respectively). 
Therefore, we can partially rely upon what was done in the proof of Lemma ^. 
We distinguish between the same cases as in the proof of Lemma |^. 

Case 1. The cardinality of I is different from s. We do not have to worry about 
this case, since e/ then lies in the span of vectors Vd',s'{P') with s' ^ s, which is taken 



care of in ( 2.31 



Case 2. The cardinality of I equals s, but I does not have the form U U U' U V 
for any U and V, U ^ Ap, V C [r;,]\(ylp U Bp), \V\ = s — d. Essentially the same 
arguments as those in Case 2 in the proof of Lemma |^ show that the term (|2.32|) 
vanishes for this choice of /. Of course, one has to use the explicit expression ( p.l|) 
for Rij. 

Case 3. I has the form UUU'UV for some U andV, U CI Ap,V CI [n]\{ApUBp), 
\V\ = s — d. In Case 3 in the proof of Lemma |^ we observed that there are N{j, k) 
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sets XUX'UY, for some X and Y, X C Ap, Y C [n]\{Ap U Bp), \Y\ = s - d, 
which have s — k elements in common with J, and which have d — j elements in 
common with I fl {Ap U Bp) = UU U', where N{j, k) is given by ( |2.19| ). Then, using 
the explicit expression ( |2.1| ) for Rjj, it is straightforward to see that the expression 
(|23^ ) equals 



2 

k>j>0 



in this case. This establishes ( |2.31| ). 

n 

Now we are in the position to write down an expression for the trace of (8)p-log Cn{u) 
By Theorem ^ and by ( |2.31| ) we have 

)p-logC„(M)) ■Vd,s{P) 

' s — d\ fn — s — d 



■ k>j>0 

■ log Xd ■ Vd,s{P) + linear combination of eigenvectors 

Vd',s'{P') ^it^ s' ^ s. (2.33) 
From what was said at the beginning of this proof, in order to obtain the trace of 

n 

®p ■ log Criiu), we have to form the sum of all the "diagonal" coefficients in ( p.33| ). 
Using the first statement of Lemma ^ and replacing + y"^ by — z"^, we see that 
it is 

E''"^^^ , , {n - 2d + 1) fn + l\ 1 ^ . ^ f d\ f s - d\ f n - s - d 

■ (1 + z)'-\r^ - zYi^ - z)''-'-\ (2.34) 
In order to see that this expression equals ( |2.29| ), we have to prove 

n—d d 



s — d J — k — J 



n—s—k 



= JL ((1 + _ _ (1 + r)\l - r)"+i-'^) . (2.35) 

We start with the left-hand side of ( p.35|) and write the inner sum in hypergeometric 
notation, thus obtaining 

n—d d / J\ 

z)-'-\l + z)^-^{r^ - z'Y^-^,F, 

s=d j=0 ^ 



'd-n + s,d- s - z^ 
1 'T^^ 
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To the 2F1 series we apply the transformation formula (1.8.10), terminating 

form] 



2F, 



a, — m 



[c — a) 



ic)r 



—m, a 
1 + a — c — m 



where m is a nonnegative integer. We write the resulting 2F1 series again as a sum 
over k. In the resulting expression we exchange sums so that the sum over j becomes 
the innermost sum. Thus, we obtain 



n—d s—d 



fc/- \n—s—k/-t , \s—k 

- z) (l + z) 



s=d k=0 



{d - s)k{n - d - s + l)s-d {d-n + s)i 
{l)k{l)s-d {2d-n)k 



j=0 



d\ ( — r 



2\ J 



Clearly, the innermost sum can be evaluated by the binomial theorem. Then, we 
interchange sums over s and k. The expression that results is 



[n/2j -d 



2\d+k,, .n-2d-2k{'2d + k - n) 



k=0 



n-2d-2k 
s=0 



n-2d-2k\ fl + z 



Again, we can apply the binomial theorem. Thus, we reduce our expression on the 
left-hand side of ([ 



to 



r,n-2d{, _^2\d Sr^ 2)k V" 2 2)k (^ _ 



„2\fc 



fc=0 



{2d-n)kk\ 



Now, we replace (1 — r^)'^ by its binomial expansion X]/Lo(~-'-)'(z)'"^'' interchange 
sums over k and /, and write the (now) inner sum over k in hypergeometric notation. 
This gives 



[n/2i-d 

^ 1=0 



n-2d/i „2^d/ ^ (_X)V2«('^ 2)^(2+^ 2)' 



:i)ii2d-n)i 



2F1 



2d + l-n 
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Finally, this 2F1 series can be summed by means of GauB' summation ( p.lOl ). Sim- 
plifying, we have 

[n/2\-d 



2\d sr^ fn-2d+l\ 

(1- '-) E ( 2/ + 1 j 



r 



which is easily seen to equal the right-hand side in ( 2.35 ). This completes the proof 
of the Theorem. □ 

n 

2.4. Asymptotics of the relative entropy of (g) p with respect to C,n{u). In the 
preceding subsection, we obtained in Corollary |^ the general formula ( |2.30D for the 

n 

relative entropy of ®p with respect to the Bayesian density matrix C,n{u)- We, now, 
proceed to find its asymptotics for n ^ 00. We prove the following theorem. 

Theorem 9. The asymptotics of the relative entropy of ®p with respect to Cn{u) for 
a fixed r with < r < 1 is given by 

3 13 1 
- logn log 2 — (1 — u) log(l — r^) H log 



2 2 2 ' ' ' ' 2r \l+r 



log r(l -u)- log r(5/2 -u) + 0{-] . (2.36) 



n 



In the case r = 0, this means that the asymptotics is given by the expression ( ^^.36] ) 
in the limit r I 0, i.e., by 

^ logn - ^ - ^ log2 + logr(l -u)- logr(5/2 - u) + O (-] . (2.37) 
2 2 2 \n J 

For any fixed e > 0, the 0{.) term in ( |2.36| ) is uniform in u and r as long as 
0<r <l-e. 

For r = 1 the asymptotics is given by 

(2-M)logn+(2M-3)log2 + ^log7r-logr(5/2-n) + . (2.38) 

Also here, the 0{.) term is uniform in u. 

Remark. It is instructive to observe that, although a comparison of ( p.36| ) and ( p.38| ) 

n 

seems to suggest that the asymptotics of the relative entropy of ®p with respect to 
Cn(w) behaves completely differently for < r < 1 and r = 1, the two cases are really 
quite compatible. In fact, letting r tend to 1 in ( |2.36| ) shows that (ignoring the error 
term) the asymptotic expression approaches +00 for u < 1/2, —00 for u > 1/2, and 
it approaches | log n — ^ — ^ log 2 + | log vr for m = 1/2. This indicates that, for r = 1, 

n 

the order of magnitude of the relative entropy of ®p with respect to Cniu) should be 
larger than |logn if u < 1/2, smaller than |logn if m > 1/2, and exactly |logn if 
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u = 1/2. How much larger or smaller is precisely what formula ( |2.38| ) tells us: the 
order of magnitude is (2 — M)logn, and in the case u = 1/2 the asymptotics is, in 
fact, I log n — 2 log 2 + i log vr. 



The proof of Theorem |^ relies on several auxiliary summations and estimations. 
These are stated and proved separately in Lemma [1^ and |TT]. 

Proof of Theorem |9|. We start with the case < r < 1. We concentrate first 
on the sum in (|2.3CI|) . Because of Xn+i-d = A^, we have 

1 {n-2d + 1) fn + 1 

. ((1 + r)"-^+\l - - (1 + r)\l - r)"-'^+^) log A, 



1 - 2rf + 1) /n + 1\ 



2"+ir 

d=0 

We expand the logarithm according to the addition rule to obtain 

Ln/2J 



— y 



(n-2d+l) fn + 1 



2n+ir (n + 1) \ d 



—y 



■ ((1 + r)"~'^+^(l - rf - (1 + r)\l - r)"-'^+i) log 
'n-2d^l) (n^l\^^^ .n-d^l,.^ _ ^d 



2n+i^ ^ (n + 1) \ d 

d=0 V ' / V 



, , 1 r(5/2-M 

log 



'n-2d+l) fn + l 



2" r(5/2 + n/2 - u) T{2 + n/2 - u) 1(1 - u) ^ 

1 + r)"~'^+^(l - rY log T(l + d- u) 



+ 2"+V ^ (n + 1) V 

d=o y ' ■> \ 

- 4^ V (!^41±l) + _ ,)n-^+i(i + iogr(i + d 

2n+i^Z^ n + 1 V / ^ & ^ 



(2.39) 



The first sum on the right-hand side of ( p.39| ) can be evaluated by means of ( p. 41 
Besides, by Stirling's formula we have 

logr(2) = (z-^) \og{z) - z + ^log2 + ^logvr + ( - ) . 
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Thus, we get 

1 {n-2d+ 1) + 

2^ ^ (n+ 1) V d 

. ((1 + r)"-'^+i(l - r)'^ - (1 + r^il - r)"-'^+i) log A, 
= -nlog2 - logr(5/2 + ra/2 -u) -logr(2 + n/2 - u) + logr(5/2 - u) 

{1/2 - u + d)log{l + d - u) - {1 - u + d) + llog2 + llogTi + O ^ 



2 ° 2 ° 

2"+ir^ (n+1) V / 
(1/2 - M + c/) \og{l + d-u) -{1-u + d) + ^log2 + ^logvr + O . ^ 



(2.40) 

Now, the sums are spht into several sums by additivity. Those which arise from 
the first sum in ( p.4q ) can be evaluated using ( p.41| ), ( p.42| ), ( p.43| ), or approximated 
using ( |2.48| ). Those which arise from the second sum can be evaluated by the same 
identities and approximations, only with r replaced by its negative. Thus, we obtain 

1 {n-2d + l) fn + 1 



2n+ir ^ (ra + 1) \ d 



. ((1 + r)"-^+i(l - - (1 + r)\l - r)"-'^+^) log A, 



Tl 3 3 X 

= -(1 - r) log((l - r)/2) + -(1 + r) log((l + r)/2) - - log n + - log 2 + - 

1 /I + r 

+ (1 - m) log(l - r) + (1 - m) log(l + r) + — log 

2r \ 1 — r ^ 

+ logr(5/2-M)-logr(l-n) + (- 
Finally, use of this in ( p.30|) gives the claimed asymptotics (|2.36|) for the relative 

n 

entropy of ®p with respect to Cn{u). 

A closer analysis of the error terms shows that they can, in fact, be bounded 
uniformly in u and r, < r < 1 — e, for any fixed positive e. 
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Now we turn to the two exceptional cases r = and r = 1. 

n 

In the case r = 1, by ( p.30|) the relative entropy of ®p with respect to Cn(w) equals 

-(1 - r) log((l - r)/2) + -(1 + r) log((l + r)/2) - log Aq, 

Ao being given by ( |2.12| ). A straightforward application of Stirling's formula then 
leads to (^^. 

n 

In the case r = 0, the relative entropy (|2.3CI| ) of ®p with respect to Cn{u) reduces 

to 

-(1 - r) log((l - r)/2) + -(1 + r) log((l + r)/2) 



1 (n-2(i+l)V^ + A, , 



2n ^ (n + 1) V d 

The asymptotics of that expression can be determined in a similar way to what was 
done for < r < 1. For the sake of brevity, we omit the derivation. The result 
is ( |2.37|) . Actually, it is possible to rearrange the computations that we did for 
< r < 1, so that in the limit r | they give a proof of ( |2.37|) . This last observation 
justifies the claim that the error term in (|2.36|) is uniform in u and r,0<r<l — e 
(i.e., including r = 0), for any fixed positive e. 

This completes the proof of the Theorem. □ 

Now, we list the summations which were used in the proof of the Theorem. 

Lemma 10. We have the following summations: 

71+1 



— — V^- ^ , 1 + 0"+^-^ 1 -r = 1. 2.41 

d=o ^ ■' ^ ^ 

1 ^ (n-2d + l) fn + 1\ ^ _ ^ il-r){nr-l) _ 

+^4^ (n + 1) V / 2r ^ ' 



d=0 

n+1 



L_ {n-2d + l) fn + 2\ _ 

£^^{n + l){n + 2)\d+l)^ ' ^ 



2(1 + 2r + nr) 



(n+ l)(n + 2)r(l - r) ' 
(2.43) 



^2 97"?/ 



d=0 

— 1 + 2r + nr — nr^ — 2ru 
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n+1 

n(l — r)' 



. (2.45) 



• (1/2 -M + d) (1 + d-u- 



2 

— 5 — n + 7r + 5nr — 3nr^ — nr^ -\- A.u— IQru — 2nru + 2nr^u + Arv? 

Ar 



n{l — r) 



2 



■ {1/2 -u + d) (^1 + d-u- 

— — (— 22 — 9n + 26r + 24nr + n r — 5nr —nr —8nr—nr—2nr +nr + 32m 
8r 

+ 4nu — 48™ — 22nru + 12nr^u + Gnr^u — 12u^ + 32™^ + Anrv? — Anr'^u^ — Sr-u^). 

(2.46) 



1 " ' " 
— y 



(n - 2d + 1) /n + 1 



2n+i^ ^ (n+1) \ d 



l + r)''+^-'^(l-r)'^ 
(1/2 -u + d)(l + d-u- 



= — (-92-61n-3n2+100r+105nr+15nV+19nr^-4nV2-35nr^-20nV^-22nr^ 
16r 

+ 7n^r'^ — 6nr^ + hn^r^ + 188-u + GOnw — 228ru — 162nru — Qn^ru + 20nr^M 
+ 6n r -u + 66nr' n + 6n r m + 16nr — 6n r u — 132u — \2nu + 20Aru 
+ 72nru^ - 36nr\^ - 24^^^^ + 32m^ - 88rM^ - 8nra^ + 8nr^M^ + 16ra^). (2.47) 



Proof. In all the cases, the sums can be split into several simpler sums, each of 
which can itself be summed using the binomial theorem. □ 
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Lemma 11. For fixed r with < r < 1 we have the following asymptotic expansion: 

n+l 



n+l 



d=0 



{n-2d + l) fn+1 
d 



[n + l] 



n 



1 - rY{l/2 ~u + d) log(l + d-u) 



-(l-r) + l-M 



^ ( log n + log(l - r) - log 2) 



7 r 1 ^/l 

4 4 2r \n 



(2.48) 



Proof. We start with the expansion 
log(l + d - m) = log ( — - 1 + log ( 1 + 



n{l — r) 
2 / 

logn + log(l - r) - log 2 H \l + d 

n[l — r) \ 



l + d-u 



n{l — r) 



nil — r] 



— u 



1 + d — u 



n{l — r) 



+ 



n^d — r)^ 



1 + d — u 



n{l — r) 



(It is at this point that we must have r < 1.) If we use this expansion in the left-hand 
side of ( |2.48|) and subsequently use (|2.44|) - (p.47|) to evaluate the resulting sums, we 
obtain 



k- E '"fa + iV' (" - 



n 



-{l-r) + l-u 



( log n + log(l - r) - log 2) 



Simplifying easily, we obtain (|2.48| ) 



□ 



2.5. Asymptotics of the von Neumann entropies of the Bayesian density 
matrices Cn(w). The main result of this section describes the asymptotics of the von 
Neumann entropy ( p..!] ) of Cn(w). In view of the explicit description of the eigenvalues 
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of Cn{u) and their multiplicities in Theorem 0, this entropy equals 

^"^'^ (n - 2d + 1)2 + 1 

d=0 

with Ad being given by ( |2.12| ). 

Theorem 12. We have the following asymptotic expansion: 

" ( 2 (2 111(1 - n) + - - - n)) + ^ logn + (-1 + 2. ) log 2 
14 - 20n + lu' ^ ^^^^ _ _ ^^^^^2 _ u)) 



2 (2 - m) (1 - m) 



+ (2 - 2m)(^(5 - 2m) - ^(1 - m)) + O ( ^ ) , (2.50) 



where il){x) is the digamma function, 



—Tix) 



Tix) 



The proof of the Theorem depends on a few summations, which we now list. 
Lemma 13. We have the following summations: 

^(^_2rf+l)2f ^ jAd = n + l. (2.52) 

d=l ^ ^ 



(^-2rf+l)2 (^^ + ^Y\ _ 2(n + 3)(2n-3) 
^4:^^ (n + l)(n + 2) Vc? + 1/ (n + l)(n + 2)M' ^' ' 
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^ (n + 1) \ d 



d=0 



1 r(5/2 - m) r(2 + n - c/ - m) r(i + « + - m) , , 

-(a — M + 1/2) 



r(5/2 + n/2-M)r(2 + n/2-M)r(i-M) 

= (48 + 64a + 25a^ + + 40n + 66ara + 37a^n + 5a^n + + 14an^ + %o?r? 
+ la^r? - 152u - 138au - 34a^M - ^ot'u - 92nu - 92anu - ?,2a^nu - 2a^nu 
- I2n^u - lOan^M - 2a^n^u + 176^^ + lOOau^ + 12a\2 + QMu^ + 320^^2 
+ 4a2raM2 + 4n\2 - 88^^ - 24au^ - IQnv!^ + 16m^) 

r(5 - 2m) r(3 + a + n - 2m) r(l + a - n) 

^ 4r(5 + a-2u)r(4 + n-2M)r(3-M) *^ ■ '* 



n+1 

E 

d=0 



'n-2rf+l)V^ + l 



\d{d - u + l/2)i){l + d - u) 



(n + 1) V 

_ 32 + 33n + - 69m - 46nM - hn^u + SOm^ + 16nM2 - 12^3 
~ 2(2-m)(1-m) (3 + n-2M) 

+ (n + 2 - 2m)(^(1 - u) +i){n + ?,- 2u) - ^/'(S - 2u)). (2.55) 



Proof. Identities ( p.51|) , ( p.52|) , ( p.53|) , ( p.54|) are proved by splitting the sums 
appropriately so that each part can be summed by means of GauB' 2F1 summation. 
Identity ( p.55| ) follows from ( p. 541) by differentiating with respect to a and then setting 
a = 0. □ 



From ( p.55|) we can deduce the following important estimation. The result and its 
proof were kindly reported to us by Peter Grabner. 

Lemma 14. We have the asymptotic expansion: 
[n - 2o? + 1) + 1' 



E 

d=0 



(n + 1) 



n ( log n + 



d 



\d{d-u + 1/2) log(l + ci-M) 



7 -5m 



2(2-M)(l-n; 



- ^(5 - 2m) + ^/'(l - m) + (2 - 2m) logn 



26 — 467/ -I- 25m2 — Av'^ / 1 

+ , + (-2 + 2m) V^(5 - 2m) + (2 - 2m) ^(1 - m) + O ^ 



2(2-m)(1-m) 



n 



l~u 



(2.56) 
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ij{z) = \og{z) - 1 + O ) . (2.57) 



Proof. We use the asymptotic expansion 

1 

In particular, this gives 

"'(^ + = '°«<^ + + « V (<i + 2) 

and 

+ 3 - 2.) = log(n H- 2 - 2.) + ^^^^^i-^ + O (1 

Using this in (7), we obtain 



H n+l 



n - 2d + If fn + 1\ {d - u + 1 /2) 



(^ + 1) \ d J " d+1 
{n-2d+lf fn + l\ id~u + l/2) 
^ \h (^ + 1) V d J \d+l)id + 2)^ 

-22 + 40m - 23^2 + 4n3 ^ ^ ^ ^ ^ s . ^ 

+ 2 - 2u) log n — — ^ + -2 + 2u) ^ 5 - 2u) + 2 - 2u) ^ 1 - m 

2 (2 — m) (1 — u) 

In the first expression on the right-hand side of ( p.58| ) we use the trivial identity 

ci-u + 1/2 _ ^_ M + 1/2 
d+1 ~ ~ d+1 ' 

to split the expression into two sums, one of which can be evaluated by means of 
( |2.51| ). The other sum equals basically —{u + 1/2) times the sum on the left-hand 
side of ( |2.53| ). What is missing is the summand for = — 1. By ( |2.53| ), the complete 
sum is of the order 0{l/n). Using Stirling's formula it is seen that the summand for 
d = —1 is of the order 0{l/n^~^). So, combining everything, the first expression in 
(|2.58| ) equals 1 + 0(l/?7.) + 0(l/?7,^~") = 1 + 0{l/n^~^). For the second expression, 
we do a similar partial fraction expansion in order to apply (|2.53 ). The result is that 
this second expression is of the order 0(1/72^""). This establishes the Lemma. □ 

Now we are in the position to prove the Theorem. 
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Proof of the Theorem. Since Xn+i-d = an equivalent expression for the 



left-hand side in (2.50) is 



2^ (n + 1 



d 



(2.59) 



Now, we expand the logarithm according to the addition rule to obtain 



n+l 



_ 1 g {n-2d+iy + 



2^ (n + l) 



n+l 



log 



r(5/2-M) 



2" r(5/2 + n/2 - m) r(2 + n/2 - m) r(l - u) 



2^ (^ + 1) ( jA,(logr(l + d-n) + logr(2 + n-rf-n)). 



The first sum in this expression can be evaluated by means of ( |2.51D . Therefore, 
we obtain for the expression on the left-hand side of ( ^.5U| ) 



n log 2 - logr(5/2 -u)+ log 1(1 - u) + logr(5/2 + n/2 - u) 



n+l 



+ logr(2 + n/2-n) 



'n-2d+lY fn + l 



d=0 



(n + l] 



Adlogr(l + rf-M). (2.60) 



The only difficulty in obtaining the asymptotics of expression ( |2.60| ) stems from 
the sum. In this sum, we use Stirling's formula 



logr(x) = (x — 1/2) log X — X + - log 2 + -logvr + O ( — 

2 2 \x 
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to get 

n+l 



^ (n + l) i ^ i^" 



=0 



((1/2 + - m) log(l + rf-M)-l + M- c?+^log2 + ^log7r + 
' („ + !) ( )A,(.-l + -log2 + -log.) 

n+l / \ / n+l / , 

d=0 ^ ^ \d=0 ^ 



(n-2rf+l)2 fn 



)(n + 2) 

(n - 2c/ + 1) + 1\ . , ,^ , . 

+ E , n (l/2-« + rf)log(l-« + rf). (2.61) 

The first expression on the right-hand side of ( p. 611) simphfies by means of ( p.51|) , 
the second by means of ( g32D . For the 0(.) term we use (|2.53| ). In fact, the sum on 
the left-hand side of ( |2.53|) differs from the sum in the 0{.) term only by the summand 
for d = —1. This summand is of the order 0(l/n^~"), as is seen by Stirhng's formula. 
Putting everything together, we obtain 

:^(n-2c/+l)V^ + l\. , . , ^ 
g (n + l) ( d jA.logr(l + .-n) 

= 2u - 2 + log2 + logvr - {n + 1) + O 



E ^"7'^!/^' (" ') A.(l/2 -u + d) log(l -u + d). (2.62) 



When we use this in (|2.6CI|) , apply Lemma 3 to the remaining sum, and simplify, we 
finally arrive at ( |2.5CI| ). □ 

3. Comparison of our asymptotic redundancies for the 

ONE-PARAMETER FAMILY q{u) WITH THOSE OF ClARKE AND BARRON 

Let us, first, compare the formula ( |1.3| ) for the asymptotic redundancy of Clarke 
and Barron to that derived here ( |2.36|) for the two-level quantum systems, in terms of 
the one-parameter family of probability densities q{u), —oo < u < 1, given in ( |1.10|) . 
Since the unit ball or Bloch sphere of such systems is three-dimensional in nature. 
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we are led to set the dimension d of the parameter space in ( |1.3| ) to 3. The quantum 
Fisher information matrix I {9) for that case was taken to be (|1.8| ), while the role of 
the probabihty function w{6) is played by q{u). Under these substitutions, it was 
seen in the Introduction that formula reduces to ( |1.12D . Then, we see that for 
< r < 1, the formulas ( p.36| ) and ( p..l2| ) coincide except for the presence of the 
monotonically decreasing (nonclassical/quantum) term ^ log (y^) (see Figure 2 for 
a plot of this term — log 2 k, .693147 "nats" of information equalling one "bit") in 
(|2.3(j| ). (This term would have to be replaced by —1 — that is, its limit for r ^ — 
to give ( |1.12D .) In particular, the order of magnitude, |logn, is precisely the same 
in both formulas. For the particular case r = 0, the asymptotic formula ( |2.36| ) (see 
( |2.37| )) precisely coincides with ( |1.12| ). 

nats 
-1 

-1 . 25 

-1.5 
-1 . 75 
-2 
-2.25 

-2.5 
-2 . 75 




0.2 0.4 0.6 0.8 1 ^ 
Nonclassical/quantum term (^logj^) in the quantum asymptotic redundancy ( p. 36 ) 

Figure 2 

In the case r = 1, however, i.e., when we consider the boundary of the parameter 
space (represented by the unit sphere), the situation is slightly tricky. Due to the fact 
that the formula of Clarke and Barron holds only for interior points of the parameter 
space, we cannot expect that, in general, our formula will resemble that of Clarke and 
Barron. However, if the probability density, q{u), is concentrated on the boundary 
of the sphere, then we may disregard the interior of the sphere, and may consider 
the boundary of the sphere as the true parameter space. This parameter space is 
two-dimensional and consists of interior points throughout. Indeed, the probability 
density q{u) is concentrated on the boundary of the sphere if we choose u = 1 since, 
as we remarked in the Introduction, in the limit m — 1, the distribution determined 
by q{u) tends to the uniform distribution over the boundary of the sphere. Let us, 
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again, (naively) attempt to apply Clarke and Barron's formula ( |1.3| ) to that case. We 
parameterize the boundary of the sphere by polar coordinates {i}, (p), 

X = sin cos (p 
y = sin sin ip 
z = cos^, 
< < 27r, < ?9 < TT. 

The probability density induced by q{u) in the limit u ^ 1 then is sin^9/47r, the 
density of the uniform distribution. Using [0, eq. 8] , the quantum (symmetric loga- 
rithmic derivative) Fisher information matrix turns out to be 



1 



sin^^ 



(3.1) 



its determinant equalling, therefore, sin^ {}. So, setting d = 2 and substituting 
sm{}/4:7i for w{9) and sin^t? for I{9) in (|1.3| ) gives logn + log 2 — 1. On the other 



hand, our formula (|2.38| ), for u = 1, gives logn. So, again, the terms differ only by a 
constant. In particular, the order of magnitude is again the same. 

Let us now focus our attention on the asymptotic minimax redundancy ( |1.4|) of 
Clarke and Barron. If in ( |1.4| ) we again set d to 3, we obtain ( |1.11| ). Clarke and Barron 
prove that this minimax expression is only attained by the (classical) Jeffreys' prior. 
In order to derive its quantum counterpart — at least, a restricted (to the family 
q{u)) version — we have to determine the behavior of 

n 

min max S(®pXniu)) (3.2) 

-oo<u<10<r<l 

for n —>■ oo. We are unable to proceed in a fully rigorous manner. However, from 
computational data we conjecture that 

n 

max S{^p,Cn{u)) (3.3) 

0<r<l 

is always attained at r = (corresponding to the fully mixed state) or r = 1 (corre- 
sponding to a pure state). Assuming the validity of this conjecture, the maximum Un 

n n 

in ( p.3| ) is a value for which S{®pXri{u))\r=Q equals S{®pXri{u))\r=i- Then we are 
able to prove that lim„^ooMn = -5. 
Namely, by our assumption we have 

n n 
S{®p, (n{Un))\r=0 = S{(E)p, («(««)) |r=l, (3.4) 

for any n. Let {unjk=i,2,... be a subsequence of the sequence (m„) which converges to 
some Uq, — oo < Uq < 1. Note that we allow Uq = — oo and Uq = 1. Therefore, there 
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is always such a subsequence. Because of ( p.4| ) we must have 

fc^oo logrifc fc^oo logrifc 

By (|2.37| ) and the fact that the error term in ( |2.37| ) is uniform in u, we know that the 
left-hand side in ( |3.5| ) is 3/2. On the other hand, by ( |2.38| ) and the fact that the error 
term in ( ^.381 ) is uniform in u, the right-hand side in ( |3.5| ) equals limfc^oo(2 — 



Hence, we must have limfe_>oo "Unfe = -5. Thus, every convergent subsequence of 
(including those which converge to — oo or 1, the boundary points of the interval of 
possible values of m„) converges to .5. Hence, the complete sequence («„) converges 
to .5, establishing our claim. Since we have regarded g(.5), that is (|1.9| ), as the 
quantum counterpart of the Jeffreys' prior (because, by analogy with the classical 
situation, it is the normalized square root of the determinant of the quantum Fisher 
information matrix, -^Z det 1(0)), this result could be considered to be fully parallel 
to that of Clarke and Barron. 

We now concern ourselves with the asymptotic maximin redundancy. Clarke and 
Barron []T^ |T8| prove that the maximin redundancy is attained asymptotically, again, 
by the Jeffreys' prior. To derive the quantum counterpart of the maximin redundancy 
within our analytical framework, we would have to calculate 

f 

maxmin / S{^p,Qn) w{x,y, z) dx dydz, (3.6) 

where Qn varies over the (2^" — l)-dimensional convex set of 2" x 2" density matrices 
and w varies over all probability densities over the unit ball. In the classical case, 
due to a result of Aitchison 0, pp. 549/550], the minimum is achieved by setting Qn 
to be the Bayes estimator, i.e., the average of all possible Qn's with respect to the 
given probablity distribution. In the quantum domain the same assertion is true. For 
the sake of completeness, we include the proof in the Appendix. We can, thus, take 
the quantum analog of the Bayes estimator to be the Bayesian density matrix C,n{u). 
That is, we set Qn = Cn{u) in (^.61). Let us, for the moment, restrict the possible w's 
over which the maximum is to be taken to the family q{u), — oo < u < 1. Thus, we 
consider 



max / S{®pXn{u)) q{u) dx dy dz. (3.7) 

' x'^+y'^+z'^<l 



By the definition ([T75|) of relative entropy, we have 
S{®pXn{u)) = Tr ®plog®p -Tr ®plogCn(M; 



(1-r), (1-r) (1+r), (1 + r) ^, , 
^log ^ + ' log ^ ' - Tr ®p logC„(M) 
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the second line being due to (|1.7[). Therefore, we get 

n 

S{®p, Cn{u)) q{u) dx dydz 



x2+j/2-(-2;2<l 

n { ^log ^ + ^log ^ ]] dip d-d dr 



JO JO 



2 ° 2 2 ° 2 

-Tr(C„(n) logC„(n)) 



-n { ^ + ^(5 - 2u) - ^(1 - ) + SiUu)). (3.8) 



From Theorem [T2|, we know the asymptotics of the von Neumann entropy S{(n{u))- 



Hence, we find that the expression ( |378| ) is asymptotically equal to 

^logn+ (-^ + 2m^ log 2 

14 - 20u + 7u' ^ ^^^^ _ _ ^^^^^2 _ u)) 



2{2-u) (l-u) 

+ (2 - 2m)(^(5 - 2u) - ^(1 -u)) + (^-^^ . (3.9) 

We have to, first, perform the maximization required in ( p.7| ), and then determine 
the asymptotics of the result. Due to the form of the asymptotics in ( ^.9] ), we can, 
in fact, derive the proper result by proceeding in the reverse order. That is, we first 

n 

determine the asymptotics of J S{®p, Cn{u)) q{u) dxdydz, which we did in (p.9|), and 
then we maximize the u-dependent part in ( |3.9| ) with respect to u (ignoring the error 
term). (In Figure 3 we display this w-dependent part over the range [—0.2,1].) Of 
course, we do the latter step by equating the first derivative of the w-dependent part 
in ( |3.9| ) with respect to u to zero and solving for u. It turns out that this equation 
takes the appealingly simple form 

2(1 - uY{ip'{l - m) - ^'(5/2 - u)) = 1. (3.10) 

Numerically, we find this equation to have the solution u ~ .531267, at which the 
asymptotic maximin redundancy assumes the value | logn — 1.77185 + 0{l/n'^^^'^^^). 
For u = .5, on the other hand, we have for the asymptotic minimax redundancy, 
|logn-2-ilog2 + ilog7r + 0(l/v^) = | logn - 1.77421 + 0(l/v^). We must, 
therefore, conclude that — in contrast to the classical case |jl^, |18| — our trial 
candidate (g(.5)) for the quantum counterpart of Jeffreys' prior can not serve as a 
"reference prior," in the sense introduced by Bernardo P, p|. 
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nats 
-Ir 



max at u= .531267 




u-dependent part of the asymptotic Bayes redundancy (3.9) 

Figure 3 

Since they are mixtures of product states, the matrices Cn{u) are classically - 
opposed to EPR, Einstein-Podolsky-Rosen — correlated [^. Therefore, S{Q 



- as 

must not be less than the sum of the von Neumann entropies of any set of reduced 
density matrices obtained from it, through computation of partial traces. For positive 
integers, rii + n2 + ■ ■ ■ = n, the corresponding reduced density matrices are simply 
Cni{u},Cn2iu), ■ ■ ■ , due to the mixing exercise 7.10]. Using these reduced density 
matrices, one can compute conditional density matrices and quantum entropies |]T3| . 
Clarke and Barron p. 40] have an alternative expression for the redundancy in 
terms of conditional entropies, and it would be of interest to ascertain whether a 
quantum analogue of this expression exists. 

Let us note that the theorem of Clarke and Barron utilized the uniform conver- 
gence property of the asymptotic expansion of the relative entropy (KuUback-Leibler 
divergence). Condition 2 in their paper is, therefore, crucial. It assumes — as 
is typically the case classically — that the matrix of second derivatives, J{0), of the 
relative entropy is identical to the Fisher information matrix I {6). In the quantum 
domain, however, in general, J{6) > I{6), where J{6) is the matrix of second deriva- 
tives of the quantum relative entropy ( |1.5|) and I {6) is the symmetric logarithmic 
derivative Fisher information matrix |]42| , |43|1 . The equality holds only for special 
cases. For instance, J{6) > I{6) does hold if r 7^ for the situation considered in 
this paper. The volume element of the Kubo-Mori/Bogoliubov (monotone) metric 
42, 43 1 is given by v^det J(e). This an be normalized for the two-level quantum 
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systems to be a member {u = 1/2) of a one-parameter family of probability densities 

(1 - u) r(5/2 - u) r log ((1 + r)/(l - r)) sin^ _ 

7r3/2(3-2«)r(l-n)(l-r2)- ' ^ < < l^-^^J 

and similarly studied, it is presumed, in the manner of the family q{u) (cf. (|1.10|) 



and (|2.5| )) analyzed here. These two families can be seen to differ — up to the nor- 
malization factor — by the replacement of log ((1 + r)/{l — r)) in ( p. Ill ) by, simply, 
r. (These two last expressions are, of course, equal for r = 0.) In general, the volume 
element of a monotone metric over the two- level quantum systems is of the form 
eq. 3.17] 

/((I -r)/(l + r))(l -r2)i/2(i + ^)' ^ ' ' 

where / : M"*" M"*" is an operator monotone function such that /(I) = 1 and 
f{t) = tf{l/t). For f{t) = (1 -|- t)/2, one recovers the volume element (^/det I{6)) 
of the metric of the symmetric logarithmic derivative, and for f{t) = {t — l)/log)f:, 
that (^det J(^)) of the Kubo-Mori/Bogoliubov metric gl], ||, ||. (It would appear, 
then, that the only member of the family q{u) proportional to a monotone metric 
is ^(.5), that is (|1.9| ). The maximin result we have obtained above corresponding 
to M ~ .531267 — the solution of ( |3.1CI|) — would appear unlikely, then, to extend 
globally beyond the family.) While J (6) can be generated from the relative entropy 
( |1.5| ) (which is a limiting case of the a-entropies ^3), I{0) is similarly obtained from 
eq. 3.16] 

Trpi(logpi - logp2)^- (3.13) 

It might prove of interest to repeat the general line of analysis carried out in this 
paper, but with the use of ( p.l3|) rather than (|1.5|) . Also of importance might be an 



analysis in which the relative entropy ( |1.5|) is retained, but the family ( |3.11|) based 



on the Kubo-Mori/Bogoliubov metric is used instead of q{u). Let us also indicate 
that if one equates the asymptotic redundancy formula of Clarke and Barron ( [1.3|) 
(using w{9) = q{u)) to that derived here (|2.36|) , neglecting the residual terms, solves 
for det(/(^)), and takes the square root of the result, one obtains a prior of the form 
(|3.12| ) based on the monotone function ti+t. 

As we said in the Introduction, ideally we would like to start with a (suitable 
well-behaved) arbitrary probability density on the unit ball, determine the relative 

n n 

entropy of ®p with respect to the average of ®p over the probability density, then 
find its asymptotics, and finally, among all such probability densities, find the one(s) 
for which the minimax and maximin are attained. In this regard, we wish to mention 
that a suitable combination of results and computations from Sec. ^ with basic facts 
from representation theory of SU{2) (cf. |]57| , p!0|] for more information on that topic) 
yields the following result. 
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Theorem 15. Let w be a spherically symmetric probability density on the unit ball, 
i.e., w = w{x,y,z) depends only on r = a/x^ + + 2;^ . Furthermore, let Cn{w) be 

the average J.j.2j^y2j^^2^i ( ® p) wdxdydz. Then the eigenvalues of C,n{w) are 

TT 

-2d+l 



= ^„_,, , r(l + rr-^+\l - ryw{\r\) dr, d = 0,l, 



-1 



with respective multiplicities 



n-2d+l/n+l 



(3.14) 
(3.15) 



77, + 1 \ d 

and corresponding eigenspaces as given by (|2.16|) . 

n ^ 

The relative entropy of ®p with respect to C,n{w) is given by (log ), with Xd as given 
m ( glil) . 

We hope that this Theorem enables us to determine the asymptotics of the relative 
entropy and, eventually, to find, at least within the family of spherically symmetric 
probability densities on the unit ball, the corresponding minimax and maximin re- 
dundancies. 

4. Summary 

Clarke and Barron p!7| , [T^ (cf. ^^) have derived several forms of asymptotic 
redundancy for arbitrarily parameterized families of probability distributions. We 
have been motivated to undertake this study by the possibility that their results 
may generalize, in some yet not fully understood fashion, to the quantum domain of 
noncommutative probability. (Thus, rather than probability densities, we have been 
concerned here with density matrices.) We have only, so far, been able to examine this 
possibility in a somewhat restricted manner. By this, we mean that we have limited 
our consideration to two- level quantum systems (rather than n-level ones, n > 2), and 
for the case n = 2, we have studied (what has proven to be) an analytically tractable 
one-parameter family of possible prior probability densities, q{u), —00 < u < 1 
(rather than the totality of arbitrary probability densities). Consequently, our results 
can not be as definitive in nature as those of Clarke and Barron. Nevertheless, the 
analyses presented here indicate that our trial candidate (^'(.5), that is ( [L.9| )) for the 
quantum counterpart of the Jeffreys' prior plays a somewhat similarly privileged — 
but less pronounced — role as in the classical case. 

Future research might be devoted to expanding the family of probability distribu- 
tions used to generate the Bayesian density matrices for n = 2, as well as similarly 
studying the n-level quantum systems (n > 2). (In this regard, we have examined the 
situation in which n = 2"*, and the only nxn density matrices considered are simply 
the tensor products of m identical 2x2 density matrices. Surprisingly, for m = 2, 3, 
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the associated trivariate candidate quantum Jeffreys' prior, taken, as throughout this 
study, to be proportional to the volume elements of the metrics of the symmetric log- 
arithmic derivative (cf. [0), have been found to be improper (nonnormalizable) over 



the Bloch sphere. The minimality of such metrics is guaranteed, however, only if 
"the whole state space of a spin is parameterized" In all such cases, it will be 

of interest to evaluate the characteristics of the relevant candidate quantum Jeffreys' 
prior vis-a-vis all other members of the family of probability distributions employed 
over the (n^ — l)-dimensional convex set of n x n density matrices. 

We have also conducted analyses parallel to those reported above, but having, ah 
initio, set either x or y to zero in the 2x2 density matrices ( |1.6|) . This, then, places us 
in the realm of real — as opposed to complex ( standard or conventional) quantum 
mechanics. (Of course, setting both x and y to zero would return us to a strictly 



classical situation, in which the results of Clarke and Barron |]T7[ |T8[, as applied to 
binomial distributions, would be directly applicable.) Though we have — on the 
basis of detailed computations — developed strong conjectures as to the nature of 
the associated results, we have not, at this stage of our investigation, yet succeeded 
in formally demonstrating their validity. 

In conclusion, again in analogy to classical results, we would like to raise the pos- 
sibility that the quantum asymptotic redundancies derived here might prove of value 
in deriving formulas for the stochastic complexity [^5|, ^] (cf. [^) — the shortest 
description length — of a string of n quantum bits. The competing possible models 
for the data string might be taken to be the 2x2 density matrices (p) corresponding 
to different values of r, or equivalently, different values of the von Neumann entropy, 
Sip). 



Appendix: The quantum Bayes estimator achieves the minimum 

average entropy 

Let Pg, G O, be a family of density matrices, and let w{6), 6 & Q, he a family of 
probability distributions. 

Theorem 16. The minimum 

mm J w{9)S{Pe,Q)d9, 

taken over all density matrices Q, is achieved by m = J w{6)Pgd6. 
Proof. We look at the difference 

w{e)S{Pe,Q)d9 - [ w{9)S{Pe,m)de, 
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and show that it is nonnegative. Indeed, 
w{9)S{Pe, Q)d9- j w{9)S{P9, m) d9 

w{9) Tr(Pe log Pe - Pe log Q) d9 - j w{9) Tr{Pe log Pe - Pq log m) d9 

w{9) Tr (Pe(logm - logQ)) d9 

Tr(^(^ j w{9)Ped9Y^ogm-\ogQ)^ 

Tr(m(logm — logQ)) 
^(m,g) >0, 



since relative entropies are nonnegative □ 
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